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Abstract 

In this paper, we study the impact of selection methods in the context of on-line 
on-board distributed evolutionary algorithms. We propose a variant of the rnEDEA 
algorithm in which we add a selection operator, and we apply it in a task-driven sce¬ 
nario. We evaluate four selection methods that induce different intensity of selection 
pressure in a multi-robot navigation with obstacle avoidance task and a collective 
foraging task. Experiments show that a small intensity of selection pressure is suf¬ 
ficient to rapidly obtain good performances on the tasks at hand. We introduce 
different measures to compare the selection methods, and show that the higher the 
selection pressure, the better the performances obtained, especially for the more 
challenging food foraging task. 


1 Introduction 

Evolutionary robotics (ER) ([9j) aims to design robotic agents’ behaviors using evo¬ 
lutionary algorithms (EA) ([7j). In this context, EA’s are traditionally seen as a 
tool to optimize agents’ controllers w.r.t to an explicit objective function (fitness). 
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This process is carried out in an off-line fashion; once the behavior is learned and 
the controller optimized, the agents are deployed and their controllers’ parameters 
remain fixed. 

On the other hand, on-line evolution f|15| j takes a different approach in which 
behavior learning is performed during the actual execution of a task. In these al¬ 
gorithms, learning or optimization is a continuous process, i.e. robotic agents are 
constantly exploring new behaviors and adapting their controllers to new conditions 
in their environment. Usually, this is referred to as adaptation. 

These two visions of ER can be related to on-line and off-line approaches in 
Machine Learning (ML). Off-line ML algorithms learn a specific task and solutions 
should generalize to unseen situations after the learning process, whereas on-line ML 
algorithms progressively adapt solutions to new presented situations while solving 
the task. In this sense, both on-line ML algorithms and on-line EA’s perform lifelong 
adaptation or learning, to possibly changing environments or objectives. 

In this paper, we focus on on-line distributed evolution of swarm robotic agents. 
We are interested in learning individual agents behaviors in a distributed context 
where the agents adapt their controllers to environmental conditions independently 
while deployed. These agents may locally communicate with each other and do not 
have a global view of the swarm. In this sense, this approach finds many ties with 
Artificial Life, where the objective is to design autonomous organisms that adapt to 
their environment. 

On-line distributed evolution may be viewed as distributing an EA on the swarm 
of agents. Traditional evolutionary operators (mutation, crossover etc.) are per¬ 
formed on the agents and local communication ensures the spread of genetic material 
in the population of agents. 

In EA’s, selection operators drive evolution toward fit individuals by controlling 
the intensity of selection pressure to solve the given task. These operators and their 
impact on evolutionary dynamics have been extensively studied in off-line contexts 
([7]). In this paper, we study their impact in on-line distributed ER, where evolu¬ 
tionary dynamics are different to the off-line case: selection is performed locally on 
partial populations and fitness values on which selection is performed are not reli¬ 
able. Our experiments show that, in this context, a strong selection pressure leads 
to the best performances, contrary to classical approaches in which lower selection 
pressure is preferred, to maintain diversity in the population. This result suggests 
that, in distributed ER algorithms, diversity is already maintained by the disjoint 
sub-populations. 

Several authors have addressed on-line evolution of robotic agent controllers in 
different contexts: adaptation to dynamically changing environments ([5]), param¬ 
eter tuning (El), evolution of self-assembly (0) , communication (El), phototaxis 
and navigation on, m)- Some of this work is detailed in the next section. The 
authors use different selection mechanisms inducing different intensities of selection 
pressure to drive evolution. In this paper, we compare different selection opera¬ 
tors and measure the impact they have on the performances of learning two swarm 
robotics tasks: navigation with obstacle avoidance and collective food foraging. 

We begin by reviewing different selection schemes proposed in the context of on¬ 
line distributed ER and then we present the algorithm that will serve as a test bed, 
along with the selection methods we compare. In the fourth section, we detail our 
experimental setting and discuss the results. Finally, we close with some concluding 
remarks and future directions of research. 
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2 Related Work 


In the following, we review several on-line distributed ER algorithms and discuss the 
selection mechanisms that were applied to ensure the desired intensity of selection 
pressure in order to drive evolution. 

A common characteristic of on-line distributed ER algorithms is that each agent 
has one controller at a time, that it executes (the active controller), and locally 
spreads altered copies of this controller to other agents. In this sense, agents have 
only a partial view of the population in the swarm (a local repository). Fitness as¬ 
signment or evaluation of individual chromosomes is performed by the agents them¬ 
selves and is thus noisy, as different agents evaluate their active controllers in differ¬ 
ent conditions. Selection takes place when the active controller is to be replaced by 
a new one from the repository. 

PGTA (Probabilistic Gene Transfer Algorithm) introduced by |15| . is commonly 
cited as the first implementation of a distributed on-line ER algorithm. This algo¬ 
rithm evolves the weights of fixed-topology neural controllers and agents exchange 
parts (genes) of their respective chromosomes using local broadcasts. The algorithm 
considers a virtual energy level that reflects the performance of the agent’s con¬ 
troller. This energy level increases every time the agents reach an energy source and 
decreases whenever communication takes place. Furthermore, the rate at which the 
agents broadcast their genes is proportional to their energy level and conversely, the 
rate at which they accept a received gene is inversely proportional to their energy 
level. This way, selection pressure is introduced in that fit agents transmit their 
genes to unfit ones. 

[12j introduced odNEAT, an on-line distributed version of NEAT (Neuro-Evolution 
of Augmenting Topologies) f|13jb where each agent has one active chromosome that 
is transmitted to nearby agents. Collected chromosomes from other agents are stored 
in a local repository within niches of species according to their topological similar¬ 
ities, as in NEAT. Each agent has a virtual energy level that increases when the 
task is performed correctly and decreases otherwise. This energy level is sampled 
periodically to measure fitness values and, whenever this level reaches zero, the ac¬ 
tive chromosome is replaced by one in the repository. At this point, a species is 
selected based on its average fitness value, then a chromosome is selected within this 
species using binary tournament. Each agent broadcasts its active chromosome at a 
rate proportional to the average fitness of the species it belongs to. This, added to 
the fact that the active chromosome is selected from fit niches, maintains a certain 
selection pressure toward fit individuals. 

EDEA (Embodied Distributed Evolutionary Algorithm) ([8]), was applied to 
different swarm robotics tasks: phototaxis, navigation with obstacle avoidance and 
collective patrolling. In this algorithm, each agent possesses one chromosome, whose 
controller is executed and evaluated on a given task. At each iteration, agents broad¬ 
cast their chromosomes alongside with their fitness to other nearby agents with a 
given probability (fixed parameter). Upon reception, an agent selects a chromosome 
from those collected using binary tournament. This last chromosome is then mu¬ 
tated and recombined (using crossover) with the current active chromosome with 
probability s ^xf(x) ’ w h ere f( x ') i s the fitness of the selected chromosome, f(x) is 
the fitness of the agent’s current chromosome and s c is a scalar controlling the in¬ 
tensity of selection pressure. To ensure an accurate measure of fitness values, agents 
evaluate their controllers for at least a minimum period of time (maturation age), 
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during which agents neither transmit nor receive other chromosomes. 

With nrEDEA (minimal Environment-driven Distributed Evolutionary Algo¬ 
rithm), |3j address evolutionary adaptation with implicit fitness, i.e. without a task- 
driven fitness function. The algorithm takes a gene perspective in which successful 
chromosomes are those that spread over the population of agents and which requires: 
1) to maximize mating opportunities and 2) to minimize the risk for agents (their 
vehicles). 

At every time step, agents execute their respective active controllers and locally 
broadcast mutated copies of the corresponding chromosomes. Received chromo¬ 
somes (transmitted by other agents) are stored in a local list. At the end of the 
execution period (lifetime), the active chromosome is replaced with a randomly se¬ 
lected one from the agent’s list and the list is emptied. An agent dies if there are no 
chromosomes in its list (if it did not meet other agents) and it remains dead until it 
receives a chromosome from another agent passing by. 

The authors show that the number of living agents rises with time and remains 
at a sustained level. Furthermore, agents develop navigation and obstacle avoid¬ 
ance capabilities that allow them to better spread their chromosomes. This work 
shows that environment-driven selection pressure alone can maintain a certain level 
of adaptation in a swarm of robotic agents. A slightly modified version of this algo¬ 
rithm is used in this work and is detailed in the next section. 

|10| proposed MONEE (Multi-Objective aNd open-Ended Evolution), an exten¬ 
sion to mEDEA adding a task-driven pressure as well as a mechanism (called mar¬ 
ket) for balancing the distribution of tasks among the population of agents, if several 
tasks are to be tackled. Their experiments show that MONEE is capable of improv¬ 
ing nrEDEA’s performances in a collective concurrent foraging task, in which agents 
have to collect items of several kinds. 

The authors show that the swarm is able to adapt to the environment (as mEDEA 
ensures), while foraging different kinds of items (optimizing the task-solving behav¬ 
ior). In this context, each type of item is considered a different task. The algorithm 
uses an explicit fitness function in order to guide the search toward better performing 
solutions. The market mechanism, which takes into account the scarcity of items, 
ensures that agents do not focus on the most frequent kind of items (the easiest 
task), thus neglecting less frequent ones. In their paper, the agent’s controller is se¬ 
lected using rank-based selection from the agent’s list of chromosomes. The authors 
argue that when a specific task is to be addressed, a task-driven selection pressure is 
necessary. This idea is discussed in the remainder of this paper. 

In the aforementioned works, authors used different classical selection operators 
from evolutionary computation in on-line distributed ER algorithms. It is however 
not clear if these operators perform in the same fashion as when they are used in an 
off-line non-distributed manner. In an on-line and distributed context, evolutionary 
dynamics are different, since selection is performed locally at the agent level and 
over the individuals whose vehicles had the opportunity to meet. In addition, and 
this is not inherent to on-line distributed evolution but to many ER contexts, fitness 
evaluation is intrinsically noisy as the agents evaluate their controllers in different 
conditions, which may have a great impact on their performance. A legitimate ques¬ 
tion one could ask is: does it still make sense to use selection? 

In this paper, we compare different selection methods corresponding to different 
intensities of selection pressure in a task-driven context. We apply these methods 
in a modified version of nrEDEA and measure their impact on two different swarm 
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robotics tasks. 


3 Algorithms 

In this section, we describe the variant of rnEDEA we used in our experiments (Al¬ 
gorithm [[]). It is run by all the agents of the swarm independently in a distributed 
manner. At any time, each agent possesses a single controller which is randomly 
initialized at the beginning of evolution. 

The main difference w.r.t. rnEDEA is that the algorithm alternates between two 
phases, namely an evaluation phase, in which the agent runs, evaluates and trans¬ 
mits its controller to nearby listening agents, and a listening phase, in which the 
agent does not move and listens to incoming chromosomes, sent by nearby agents. 

The evaluation and the listening phases last T c and T\ respectively, and, for different 
robots, they take place at different moments. Since the different robots are desyn¬ 
chronized, robots in the evaluation phase are able to spread their genomes to other 
robots that are in the listening phase. 

If only one common phase takes place, an agent that turns on the spot transmits 
its controller to any fitter agent crossing it, as broadcast and reception are simulta¬ 
neous. This separation in two phases is inspired from MONEE where it is argued 
that it lessens the spread of poorly achieving controllers. Also, task-driven selection 
was introduced in MONEE to simultaneously tackle several tasks. 

The agent’s controller is executed and evaluated during the evaluation phase. 

For each agent, this phase lasts T e time-steps at mos10. During this phase, at each 
time-step the agent executes its current controller by reading the sensors’ inputs 
and computing the motors’ outputs. The agent also updates the fitness value of the 
controller, based on the outcome of the its actions, and locally broadcasts both the 
chromosome corresponding to its controller and its current fitness value. 

Once the T e evaluation steps are elapsed, the agent begins its listening phase, 
which lasts T\ time-steps. During this phase, the agent stops and listens for incom¬ 
ing chromosomes from nearby passing agents (agents that are in their evaluation 
phase). These chromosomes are transmitted along with their respective fitness val¬ 
ues. Consequently, at the end of this phase, an agent has a local list of chromosomes 
and fitnesses, or local population. Another difference w.r.t rnEDEA is that the local 
population also contains the agent’s current chromosome. This is done to ensure 
that all agents always have at least one chromosome in their respective populations, 
which happens particularly when an agent is isolated during its listening phase and 
does not receive any other chromosome. In rnEDEA, isolated agents stay inactive 
until they receive a chromosome from another agent passing by. 

After the listening period, the agent needs to load a new controller for its next 
evaluation phase. To do so, it selects a chromosome from its list using one of the 
selection methods discussed further. The selected chromosome is then mutated and 
becomes the agent’s active controller. In this case, mutation consists in adding a 
normal random variable with mean 0 and variance a 2 to each gene (each synaptic 
weight of the neuro-controller). 

Once the next controller is chosen, the list is emptied. This means selection is 
performed on a list of chromosomes that have been collected by the agent during the 

: A little random number is substracted from T e so as the evaluation phases of the agents are not 
synchronized. 
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Algorithm 1 mEDEA 

1: g a := randomi) 

2: while true do 
3: := 0 

4: // Evaluation phase 

5: for t — 1 to T c do 

6: exec(g a ) 

7: broadcast(g a ) 

8 : end for 

9: // Listening phase 

10: for t — 1 to Ti do 

11: := [J listenQ 

12: end for 

13: := U {g a } 

14: selected := selecti) 

15: g a \= mutate(selected) 

16: end while 


previous listening phase. At this time, the new controller’s evaluation phase begins. 
We consider one iteration of the algorithm (evaluation plus listening phase) as one 
generation. 

The selection method selects the new chromosome among the collected ones 
based on their fitness. This can be done in different manners, depending on the 
desired intensity of selection pressure. In this paper we compare four different selec¬ 
tion methods, each one defining a different intensity of task-driven selection pressure. 
The choice of these selection methods aims at giving a large span of intensities of 
selection pressure, from the strongest (Best), to the lowest (Random): 

Best Selection: This method deterministically selects the controller with the 
highest fitness. This is the selection method with the strongest selection pressure, 
as the agent will never be allowed to select a controller with a lower fitness than the 
previous one. Best selection can be compared to an elitist selection scheme where 
the best fit controllers are always kept. 


Algorithm 2 Best selection 
1: order Xi and index as such that: 

/Ol:n) > f(x 2:n ) > > f(x n:n ) 

2: return x\ :n 


Rank-Based Selection: In this case, selection probabilities are assigned to 
each controller according to their rank, i.e. the position of the controller in the list, 
once sorted w.r.t. fitness values. The best controller has the highest probability of 
being selected; however, less fit controllers still have a positive probability of being 
selected. Traditionally, this method is preferred to Roulette Wheel selection that 
assigns individuals probabilities proportional to their fitness values, which highly 
biases evolution toward best individuals. 
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Algorithm 3 Rank-based selection 
1: order Xi and index as Xi :n such that: 
/Ol:n) > f(x 2:n) > > f(x n:n ) 

2: select x i:n with probability Pr(x i:n ) = 1 

3: return x i:n 


Binary Tournament: This method uniformly samples a number of controllers 
equal to the size of the tournament (two in our case) and selects the one with the 
highest fitness. Here, the selection pressure is adjusted through the size of the tour¬ 
nament: the higher the size, the higher the selection pressure, the extreme case 
being when the tournament size is equal to the size of the population. In this case, 
the best controller is choser^]. Conversely, when the size of the tournament is two, 
the induced selection pressure is the lowest. 


Algorithm 4 £;-Tournament selection 
1: uniformly sample k x^ noted {xi,k, • • •, Xk,k} 
2: order Xi± such that: 

f(xi-.k) > f(x2-. k ) > ■ > f(x k:k ) 

3: return X\± 


Random Selection: This method selects a controller in the local population at 
random, disregarding its fitness value and therefore inducing no task-driven selection 
pressure at all. Randon selection is considered as a baseline for comparisons with 
the other methods that effectively induce a certain task-driven selection pressure. 
As discussed in the previous section, this is the selection operator used by rnEDEA 
for evolving survival capabilities of the swarm without any task-driven explicit goal. 
By considering Random in our experiments, we aim to compare the original mEDEA 
selection scheme with more selective operators. 

Each one of these four selection methods induces a different intensity of selec¬ 
tion pressure on the evolution of the swarm. In the next section, we describe our 
experiments comparing the impact of each one of these intensities. 


4 Experiments 

We compare these selection methods on a set of experiments in simulation for two 
different tasks, fast-forward navigation and collective foraging, which are two well- 
studied benchmarks in swarm robotics. All our experiments were performed on the 
RoboRobo simulator ([4]). 

4.1 Description 

In all experiments, a swarm of robotic agents is deployed in a bounded environment 
containing static obstacles (black lines in Figure [I]). Agents also perceive other 
agents as obstacles. 

2 It is assumed that sampling is performed without replacement. 
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All the agents in the swarm are morphologically homogeneous, i.e. they have 
the same physical properties, sensors and motors, and only differ in the parameters 
of their respective controllers. Each agent has 8 obstacle proximity sensors evenly 
distributed around the agent, and 8 food item sensors are added in the case of the 
foraging task. An item sensor measures the distance to the closest item in the direc¬ 
tion of the sensor. These simulated agents are similar to Khepera or e-puck robots. 

We use a recurrent neural network as the architecture of the neuro-controllers 
of the agents (Figure [l]). The inputs of the network are the activation values of all 
sensors and the 2 outputs correspond to the translational and rotational velocities 
of the agent. The activation function of the output neurons is a hyperbolic tangent, 
taking values in [—1, +1], Two bias connections (one for each output neuron), as well 
as 4 recurrent connections (previous speed and previous rotation for both outputs) 
are added. This setup yields 22 connection weights for the navigation task and 38 
for the foraging task in the neuro-controller. The chromosome of the controller is 
the vector of these weights. Table Q] summarizes the different parameters used in our 
experiments. 



Figure 1: Left: the simulation environment containing agents (red dots with thin hair¬ 
lines representing sensors), obstacles (dark lines) and food items (blue dots). Right: the 
architecture of the neuro-controller. 

In the navigation task, agents must learn to move as fast and straight as possible 
in the environment while avoiding obstacles, whereas in the foraging task, agents 
must collect food items present in the environment (Figure [I]). An item is collected 
when an agent passes over it, at which time it is replaced by another item at a 
random location. 

We define the fitness function for the navigation task after the one introduced in 
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Table 1: Experimental settings. 


Experiments 

Number of food items 150 

Swarm size 

50 agents 

Exp. length 

5 x 10 5 sim. steps 

Number of runs 

30 

Evolution 

Evolution length 

~ 250 generations 

T e 

2000 — rand{ 0, 500) sim. steps 

T\ 

200 sim. steps 

Chromosome size 

Nav.: 22, Forag.: 38 

Mutation step-size 

ct = 0.5 


(0)- Each agent r computes its fitness at generation g as: 

T e 

fr = X^W ' t 1 ~ MOD ' min(a s (t )) (1) 

t =l 

where vt ( t ), (t) and a s (t) are respectively the translational velocity, the rotational 
velocity and the activations of the obstacle sensors of the agent at each time-step t 
of its evaluation phase. 

In the foraging task, a controller’s fitness is computed as the number of items 
collected during its evaluation phase. Furthermore, since we are interested in the 
performance of the entire swarm, we define the swarm fitness as the sum of the 
individual fitness of all agents at each generation: 

F s (g) = X fr (2) 

r£swarm 


4.2 Measures 

A characteristic of on-line ER is that agents learn as they are performing the ac¬ 
tual task in an open-ended way. In this context, the best fitness ever reached by 
the swarm is not a reliable measure, since it only reflects a ’’good” performance at 
one point of the evolution. Furthermore, fitness evaluation is inherently noisy, due 
to different evaluation conditions encountered by the agents. Therefore, we intro¬ 
duce four measures that will be used to compare the impact of the different selection 
methods. These measures summarize information on the swarm spanning over sev¬ 
eral generations. They are used only for evaluation and comparison of the selection 
methods and are computed once the evolution has ended. A pictorial description of 
these four measures is shown in Figure [2j 

• Average accumulated swarm fitness (/ c ): is the average swarm fitness in the 
last generations. This metric reflects the performance of the swarm at the end 
of the evolution. In our experiments, we compute the average over the last 8% 
generations. 

• Fixed budget swarm fitness (/&): is the swarm fitness reached at a certain 
generation (computational budget). This measure helps to compare different 
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Figure 2: A pictorial description of the four comparison measures. From top to bottom 
and left to right: the average accumulated swarm fitness, the fixed budget swarm fitness, 
the time to reach target and the accumulated fitness above target. 

methods on the same grounds. In our experiments, we measure this value at 
92% of the evolution, which corresponds to the first generation considered in 
the computation of f c . 

• Time to reach target ( gf ): is the first generation at which a predefined target 
fitness is reached. If this level is never reached, gf corresponds to the last 
generation. We fixed the target at 80% of the maximum fitness reached over 
all runs and all selection methods. This metric reflects a certain convergence 
rate of the algorithms, i.e. how fast the swarm hits the target fitness on the 
task at hand. 

• Accumulated fitness above target (/ a ): is the sum of all swarm fitness values 
above a predefined target fitness. It reflects to which extent the target level is 
exceeded and if this performance is maintained over the long run. We used the 
same target fitness as with gf. 

These comparison measures are not to be taken individually. For instance f c and 
fb complement each other and give an indication of the level and stability of the 
performance reached by the swarm at the end of evolution. If fb and f c are close 
then performance of the swarm is stable. Also, gf and f a combined reflect how fast 
a given fitness level is reached and to which extent that level is exceeded. Adding 
the two latter measures to f c shows if that trend is maintained. 
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4.3 Results and discussion 


For both navigation and foraging tasks, we run 30 independent runs for each selec¬ 
tion method, and we measured F s at each generation in all runs. Figures [3] and [4] 
show the median F s per generation over the 30 runs for each task. We computed 
the four performance metrics in the case of navigation (Figure (5j) and of foraging 
(Figure [6]). For both tasks, we performed pairwise^ Mann-Whitney tests at 99% 
confidence on these measures, between the four selection methods. 



Figure 3: Median swarm fitness per generation over the 30 runs for the navigation task. 

On the one hand, upon analysis of Figure [3] and Figure [4J we observe that the 
swarm rapidly reaches a high fitness level in both tasks whenever there is a task- 
driven selection pressure, i.e. with Best, Rank-based or Binary tournament selec¬ 
tion. On the other hand, without any selection pressure (Random), learning is much 
slower. Furthermore, for the three former selection methods the algorithm reaches 
comparable levels of performance in terms of median values of the swarm fitness. 

An exception can be noted for Best selection in the foraging task, which outper¬ 
forms Rank-based and Binary tournament. 

Despite the lower performances achieved by Random, the swarm still manages 
to learn behaviors for both tasks. This can be seen in the increasing trend of the 

3 Pairwise in this context means all combinations of pairs of selection methods, six combinations in our 
case. 
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Figure 4: Median swarm fitness per generation over the 30 runs for the foraging task. 

median swarm fitness in Figure [3] and Figure [4] This result is expected on the navi¬ 
gation task. As it is the case in ([3]), environmental pressure drives evolution toward 
behaviors that maximize mating opportunities and thus behaviors that explore the 
environment, increasing the swarm fitness. 

The same trend is also observed on the foraging task. The improvement is slower 
but still present with Random selection. This could be explained by the fact that 
collecting items is a byproduct of maximizing mating opportunities. Agents collect 
items by chance while they navigate trying to mate. When inspecting the swarm 
in the simulator, we observed that, when selection pressure is present, the evolved 
behaviors drive the agents toward food items which means that the food sensors are 
in fact exploited. In other words, evolution drove the controllers to use these sensors. 
However, without any selection pressure (Random), there can not be a similar drive. 

We also observed this in the simulator: agents were not attracted by food items for 
Random selection. 

When we analyze the comparison measures we introduced earlier, similar trends 
are observed. Figure [5] (respectively Figure [6]) shows the box and whiskers plots of 
the four measures for each selection method over the 30 runs for the navigation task 
(respectively the foraging task). 

On the navigation task, the pairwise comparisons of the four measures, using 
Mann-Whitney tests at 99% confidence level, yield significant statistical difference 
between all selection methods except between Best and Rank-based (p-value=0.0795) 
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Figure 5: Box and whisker plots (30 independent runs) of the comparison measures 
for the four selection methods on the navigation task. From top to bottom and left to 
right: f c , j\. gj and f a . The label p > 0.01 indicates no statistical difference for the 
corresponding two selection methods. 

and between Rank-based and Binary tournament (p-value 0.0116) in the case of the 
time to reach target (<7/). 

We also observe that Best reaches a higher swarm fitness for the fixed budget 
than the rest of selection methods, and this level is maintained at the end of evolu¬ 
tion, as is shown in f c and fb (upper left and right in the figure). The target fitness 
level is rapidly reached for the three methods inducing selection pressure, and there 
is not significant difference between Best and Rank-based , nor between Rank-based 
and Binary Tournament w.r.t gf (lower left). Furthermore, in the case of Best, the 
required level is not only reached but surpassed during the entire evolution, leading 
to a value of f a much higher than the ones of the rest of selection methods (lower 
right). However, this is not the case for Random selection that has much lower fb 
(upper right) and f c (upper left), and does not reach the target fitness level on more 
than half the runs that were launched (lower left and right). 

On the foraging task, there is a significant difference for all pairwise comparisons, 
except between Binary Tournament and Random in the case of the time to reach 
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Figure 6: Box and whisker plots (30 independent runs) of the comparison measures for 
the four selection methods on the foraging task. From top to bottom and left to right: f c , 
fb, gj and f a ■ The label p > 0.01 indicates no statistical difference for the corresponding 
two selection methods. 

target, gj, and the accumulated fitness above target, f a , (p-value—0.0419 in both 
cases). This is explained by the fact that very few runs attained the target fitnes^l in 
which case gj is the last generation and f a is almost zero. There is also no statistical 
difference between Rank-based and Binary Tournament on the fixed budget swarm 
fitness, fb, (p-value—0.0105). This means that Binary Tournament reaches a fitness 
at the given budget that is comparable to the one of Rank-based , but it does not 
maintain this level, since for these two methods the difference is significant on f c . 

Best also gives better results on the foraging task: a high swarm fitness is reached 
and maintained at the end of evolution (fb and f c , upper left and right). It surpasses 
the target fitness level in almost all runs much faster and to a larger extent than 
Rank-based, that also manages to reach the required level for most runs (gf, lower 
left), although by a lower level ( f a , lower right). This is not the case for Tournament 
and Random that do not achieve the target fitness level for most runs (lower left and 

4 For both tasks, the target fitness is 80% of the highest fitness reached by all methods during all runs. 
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right). 

We can observe that all task-driven selection pressures yield much better per¬ 
formances on both tasks compared to Random selection. Consequently, we may 
conclude that selection pressure has a positive impact on performances, when solv¬ 
ing a given task, and when the objective is not only to achieve adaptation of the 
swarm as it was the original motivation of mEDEA. Further, statistical tests show a 
direct correlation between the selection pressure and the performances achieved by 
the swarm on the two considered tasks. In other words, the stronger the selection 
pressure, the better the performances reached by the swarm. 

In general, it has been argued that elitist strategies are not desirable in tradi¬ 
tional EA’s, and the same argument holds for traditional ER. This is due to the fact 
that elitist strategies may lead to a premature convergence at local optima. There 
exists an extensive body of work, especially in non-convex optimization, where it 
is preferable to explicitly maintain a certain level of diversity in the population to 
escape local optima and to deal with the exploration vs. exploitation dilemma. This 
requirement is perhaps not as strong in the context of distributed ER as our ex¬ 
periments show. Selection is performed among a portion of the population at the 
agent level, therefore, one might argue that these algorithms already maintain a cer¬ 
tain level of diversity inherent to the fact that sub-populations are distributed on 
the different agents. Comparisons with other approaches in which separated sub¬ 
populations are evolved, such as spatially structured EA’s ([14]) and island models 
( 0 ). could give further insights on the dynamics of this kind of evolution. 


5 Conclusions 

In this paper, we studied the impact of task-driven selection pressures in on-line 
distributed ER for swarm behavior learning. This kind of algorithms raises several 
questions concerning the usefulness of selection pressure (partial views of popula¬ 
tion, noisy fitness values, etc.). We compared four selection methods inducing dif¬ 
ferent intensities of selection pressure on two tasks: navigation with obstacle avoid¬ 
ance and collective foraging. Our experiments show that selection pressure largely 
improves performances, and that the intensity of the selection operator positively 
correlates with the performances of the swarm. 

Foraging and navigation can be considered as relatively simple tasks and we be¬ 
lieve that more complex and challenging ones, involving deceptive fitness functions, 
could give further insights on selection and evolution dynamics in the distributed 
case. 
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